Learning Qualitative Markov Decision Processes Learning Qualitative Markov Decision Processes
نویسندگان
چکیده
To navigate in natural environments, a robot must decide the best action to take according to its current situation and goal, a problem that can be represented as a Markov Decision Process (MDP). In general, it is assumed that a reasonable state representation and transition model can be provided by the user to the system. When dealing with complex domains, however, it is not always easy or possible to provide such information. In this paper, a system is described that can automatically produce a state abstraction and can learn a transition function over such abstracted states, called q-states. A qualitative state is a group of states with similar properties and rewards. They are induced from the reward function using decision trees. The transition model, represented as a factored MDP, is learned using a Bayesian network learning algorithm. The outcome of this combined learning process produces a very compact MDP that can be efficiently solved using standard techniques. We show experimentally that this approach can learn efficiently a reasonable policy for a mobile robot in large and complex domains.
منابع مشابه
Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملMarkov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences
In a standard Markov decision process (MDP), rewards are assumed to be precisely known and of quantitative nature. This can be a too strong hypothesis in some situations. When rewards can really be modeled numerically, specifying the reward function is often difficult as it is a cognitively-demanding and/or time-consuming task. Besides, rewards can sometimes be of qualitative nature as when the...
متن کاملController Synthesis and Verification for Markov Decision Processes with Qualitative Branching Time Objectives
We show that the controller synthesis and verification problems for Markov decision processes with qualitative PECTL∗ objectives are 2-EXPTIME complete. More precisely, the algorithms are polynomial in the size of a given Markov decision process and doubly exponential in the size of a given qualitative PECTL∗ formula. Moreover, we show that if a given qualitative PECTL∗ objective is achievable ...
متن کاملAn Analysis of Families and Community’s Involvement in The Schools’ Educational and Administrative Processes Using “Overlapping Spheres of Influence” Model
Abstract The overall purpose of this research was to analyze the Families and Community’s involvement in the educational and managerial processes of schools using The “Overlapping Spheres of Influence” Theoretical Model. The research method was an explanatory (quantitative - qualitative) combination. The statistical population consisted of all teachers, administrators and parents of elementary...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005